UNIBA: JIGSAW algorithm for Word Sense Disambiguation
نویسندگان
چکیده
Word Sense Disambiguation (WSD) is traditionally considered an AI-hard problem. A breakthrough in this field would have a significant impact on many relevant webbased applications, such as information retrieval and information extraction. This paper describes JIGSAW, a knowledge-based WSD system that attemps to disambiguate all words in a text by exploiting WordNet1 senses. The main assumption is that a specific strategy for each Part-Of-Speech (POS) is better than a single strategy. We evaluated the accuracy of JIGSAW on SemEval2007 task 1 competition2. This task is an application-driven one, where the application is a fixed cross-lingual information retrieval system. Participants disambiguate text by assigning WordNet synsets, then the system has to do the expansion to other languages, index the expanded documents and run the retrieval for all the languages in batch. The retrieval results are taken as a measure for the effectiveness of the disambiguation. 1 The JIGSAW algorithm The goal of a WSD algorithm consists in assigning a word wi occurring in a document d with its appropriate meaning or sense s, by exploiting the context C in where wi is found. The context C for wi is defined as a set of words that precede and follow wi. The sense s is selected from a predefined set of possibilities, usually known as sense inventory. In the proposed algorithm, the sense inventory is obtained from WordNet 1.6, according to SemEval-2007 task 1 instructions. JIGSAW is a WSD algorithm based on the idea of combining three different strategies to disambiguate nouns, verbs, adjectives and adverbs. The main motivation behind our approach is that http://wordnet.princeton.edu/ http://www.senseval.org/ the effectiveness of a WSD algorithm is strongly influenced by the POS tag of the target word. An adaptation of Lesk dictionary-based WSD algorithm has been used to disambiguate adjectives and adverbs (Banerjee and Pedersen, 2002), an adaptation of the Resnik algorithm has been used to disambiguate nouns (Resnik, 1995), while the algorithm we developed for disambiguating verbs exploits the nouns in the context of the verb as well as the nouns both in the glosses and in the phrases that WordNet utilizes to describe the usage of a verb. JIGSAW takes as input a document d = {w1, w2, . . . , wh} and returns a list of WordNet synsets X = {s1, s2, . . . , sk} in which each element si is obtained by disambiguating the target word wi based on the information obtained from WordNet about a few immediately surrounding words. We define the context C of the target word to be a window of n words to the left and another n words to the right, for a total of 2n surrounding words. The algorithm is based on three different procedures for nouns, verbs, adverbs and adjectives, called JIGSAWnouns, JIGSAWverbs, JIGSAWothers, respectively. More details for each one of the above mentioned procedures follow.
منابع مشابه
UNIBA: Combining Distributional Semantic Models and Sense Distribution for Multilingual All-Words Sense Disambiguation and Entity Linking
This paper describes the participation of the UNIBA team in the Task 13 of SemEval-2015 about Multilingual All-Words Sense Disambiguation and Entity Linking. We propose an algorithm able to disambiguate both word senses and named entities by combining the simple Lesk approach with information coming from both a distributional semantic model and usage frequency of meanings. The results for both ...
متن کاملUNIBA: Combining Distributional Semantic Models and Word Sense Disambiguation for Textual Similarity
This paper describes the UNIBA team participation in the Cross-Level Semantic Similarity task at SemEval 2014. We propose to combine the output of different semantic similarity measures which exploit Word Sense Disambiguation and Distributional Semantic Models, among other lexical features. The integration of similarity measures is performed by means of two supervised methods based on Gaussian ...
متن کاملEVALITA 2009 Lexical Substitution Task
This paper presents the participation of the University of Bari (UNIBA) at the EVALITA 2009 Lexical Substitution Task. The goal of the task is to substitute a word in a particular context providing the best synonyms which fit in that context. This task is a different way to evaluate Word Sense Disambiguation (WSD) algorithms. Indeed, understanding the meaning of the target word is necessary to ...
متن کاملUNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Linking Entities in Tweets
This paper describes the participation of the UNIBA team in the Named Entity rEcognition and Linking (NEEL) Challenge. We propose a knowledge-based algorithm able to recognize and link named entities in English tweets. The approach combines the simple Lesk algorithm with information coming from both a distributional semantic model and usage frequency of Wikipedia concepts. The algorithm perform...
متن کاملUNIBA-SENSE @ CLEF 2009: Robust WSD task
This paper presents the participation of the semantic N-levels search engine SENSE at the CLEF 2009 Ad Hoc Robust-WSD Task. During the participation at the same task of CLEF 2008, SENSE showed that WSD can be helpful to improve retrieval, even though the overall performance was not exciting mainly due to the adoption of a pure Vector Space Model with no heuristics. In this edition, our aim is t...
متن کامل